Meta-level Statistical Machine Translation
نویسندگان
چکیده
We propose a simple and effective method to build a meta-level Statistical Machine Translation (SMT), called meta-SMT, for system combination. Our approach is based on the framework of Stacked Generalization, also known as Stacking, which is an ensemble learning algorithm, widely used in machine learning tasks. First, a collection of base-level SMTs is generated for obtaining a meta-level corpus. Then a meta-level SMT is trained on this corpus. In this paper we address the issue of how to adapt stacked generalization to SMT. We evaluate our approach on Englishto-Persian machine translation. Experimental results show that our approach leads to significant improvements in translation quality over a phrase-based baseline by about 1.1 BLEU points.
منابع مشابه
Neural and Statistical Methods for Leveraging Meta-information in Machine Translation
In this paper, we discuss different methods which use meta information and richer context that may accompany source language input to improve machine translation quality. We focus on category information of input text as meta information, but the proposed methods can be extended to all textual and non-textual meta information that might be available for the input text or automatically predicted...
متن کاملDesign and compilation of a specialized Spanish-German parallel corpus
This paper discusses the design and compilation of the TRIS corpus, a specialized parallel corpus of Spanish and German texts. It will be used for phraseological research aimed at improving statistical machine translation. The corpus is based on the European database of Technical Regulations Information System (TRIS), containing 995 original documents written in German and Spanish and their tra...
متن کاملMeta-Structure Transformation Model for Statistical Machine Translation
We propose a novel syntax-based model for statistical machine translation in which meta-structure (MS) and meta-structure sequence (SMS) of a parse tree are defined. In this framework, a parse tree is decomposed into SMS to deal with the structure divergence and the alignment can be reconstructed at different levels of recombination of MS (RM). RM pairs extracted can perform the mapping between...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملThe RWTH System Combination System for WMT 2011
RWTH participated in the System Combination task of the Sixth Workshop on Statistical Machine Translation (WMT 2011). For three language pairs, we combined 6 to 14 systems into a single consensus translation. A three-level metacombination scheme combining six different system combination setups with three different engines was applied on the French–English language pair. Depending on the langua...
متن کامل